Implementation of DAWG

نویسنده

  • Miroslav Balík
چکیده

Let T be a text over a xed alphabet A. Then an automaton can be created in a linear time that accepts all substrings that occur in text T . The ratio of the size of the implementation of this automaton (factor automaton, DAWG) and of the input text is in usual cases 14:1 . This paper shows a method of implementing DAWG that reduces this ratio down to 4:1 while preserving good qualities of the automaton, which is linear time of its construction with respect to the length of the input text and linear time of checking that a pattern is present in the text with respect to the length of the pattern.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation of directed acyclic word graph

An effective implementation of a Directed Acyclic Word Graph (DAWG) automaton is shown. A DAWG for a text T is a minimal automaton that accepts all substrings of a text T, so it represents a complete index of the text. While all usual implementations of DAWG needed about 30 times larger storage space than was the size of the text, here we show an implementation that decreases this requirement d...

متن کامل

Dictionary Representation Using Eecient Dawg Implementation

The huge amount of information stored on a dictionary has increased the need for text compression. The amount of compression that can be obtained using current techniques is usually a tradeoo between speed and the amount of memory required. There is a considerable potential for savings to be made by the use of compression. Although hash tables are widely used, a trie structure is more appropria...

متن کامل

DNA assembly with gaps (Dawg): simulating sequence evolution

MOTIVATION Relationships amongst taxa are inferred from biological data using phylogenetic methods and procedures. Very few known phylogenies exist against which to test the accuracy of our inferences. Therefore, in the absence of biological data, simulated data must be used to test the accuracy of methods which produce these inferences. Researchers have limited or non-existent options for simu...

متن کامل

Direct Construction of Compact Directed Acyclic Word Graphs

The Directed Acyclic Word Graph (DAWG) is an e cient data structure to treat and analyze repetitions in a text, especially in DNA genomic sequences. Here, we consider the Compact Directed Acyclic Word Graph of a word. We give the rst direct algorithm to construct it. It runs in time linear in the length of the string on a xed alphabet. Our implementation requires half the memory space used by D...

متن کامل

Developing a Policy Framework for Digital Preservation

The Arts and Humanities Data Service (AHDS) has been established by the Joint Information Systems Committee of the UK's Higher Education Funding Councils to collect, preserve and promote re-use of digital resources which result from or support research and teaching in the arts and humanities. Within the UK, the Digital Archiving Working Group (DAWG) has been formed to co-ordinate research into ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998